On the Complexity of Rule Discovery from Distributed Data
نویسنده
چکیده
This paper analyses the complexity of rule selection for supervised learning in distributed scenarios. The selection of rules is usually guided by a utility measure such as predictive accuracy or weighted relative accuracy. Other examples are support and confidence, known from association rule mining. A common strategy to tackle rule selection from distributed data is to evaluate rules locally on each dataset. While this works well for homogeneously distributed data, this work proves limitations of this strategy if distributions are allowed to deviate. To identify those subsets for which local and global distributions deviate may be regarded as an interesting learning task of its own, explicitly taking the locality of data into account. This task can be shown to be basically as complex as discovering the globally best rules from local data. Based on the theoretical results some guidelines for algorithm design are derived.
منابع مشابه
Measurement of Complexity and Comprehension of a Program Through a Cognitive Approach
The inherent complexity of the software systems creates problems in the software engineering industry. Numerous techniques have been designed to comprehend the fundamental characteristics of software systems. To understand the software, it is necessary to know about the complexity level of the source code. Cognitive informatics perform an important role for better understanding the complexity o...
متن کاملFUZZY GRAVITATIONAL SEARCH ALGORITHM AN APPROACH FOR DATA MINING
The concept of intelligently controlling the search process of gravitational search algorithm (GSA) is introduced to develop a novel data mining technique. The proposed method is called fuzzy GSA miner (FGSA-miner). At first a fuzzy controller is designed for adaptively controlling the gravitational coefficient and the number of effective objects, as two important parameters which play major ro...
متن کاملApplication of Rough Set Theory in Data Mining for Decision Support Systems (DSSs)
Decision support systems (DSSs) are prevalent information systems for decision making in many competitive business environments. In a DSS, decision making process is intimately related to some factors which determine the quality of information systems and their related products. Traditional approaches to data analysis usually cannot be implemented in sophisticated Companies, where managers ne...
متن کاملDistributed and Cooperative Compressive Sensing Recovery Algorithm for Wireless Sensor Networks with Bi-directional Incremental Topology
Recently, the problem of compressive sensing (CS) has attracted lots of attention in the area of signal processing. So, much of the research in this field is being carried out in this issue. One of the applications where CS could be used is wireless sensor networks (WSNs). The structure of WSNs consists of many low power wireless sensors. This requires that any improved algorithm for this appli...
متن کاملTowards a Cost-Effective Parallel Data Mining Approach
Massive rule induction has recently emerged as one of the powerful data mining techniques. The problem is known to be exponential in the size of the attributes, and given its ever increasing use, can greatly benefit from parallelization. In this paper, we study cost-effective approaches to parallelize rule generation algorithms. In particular, we consider the propositional rule generation algor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005